Evaluating CHIRPS with Local Rainfall Data

This notebook evaluates the performance of CHIRPS rainfall data against local weather station observations in the Citarum Basin, Indonesia.
Author
Published

Saturday, December 14, 2024

Modified

Sunday, December 8, 2024

Abstract

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis sagittis posuere ligula sit amet lacinia. Duis dignissim pellentesque magna, rhoncus congue sapien finibus mollis. Ut eu sem laoreet, vehicula ipsum in, convallis erat. Vestibulum magna sem, blandit pulvinar augue sit amet, auctor malesuada sapien. Nullam faucibus leo eget eros hendrerit, non laoreet ipsum lacinia. Curabitur cursus diam elit, non tempus ante volutpat a. Quisque hendrerit blandit purus non fringilla. Integer sit amet elit viverra ante dapibus semper. Vestibulum viverra rutrum enim, at luctus enim posuere eu. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.

Keywords

CHIRPS, Citarum Watershed, Rainfall, Precipitation, Hydrology, Data Comparison, Data Analysis, Indonesia

Imagine two weather reporters, one in a satellite 🛰️ high above the Earth and one on the ground at a local weather station 📡. They’re both reporting on the same thing: rainfall 🌧️. The satellite reporter represents global, gridded rainfall datasets like CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data), which provide a broad, top-down view of rainfall patterns across vast regions. The ground reporter represents the network of local weather stations, collecting precise rainfall measurements at specific points on the Earth’s surface. Are these two reporters telling the same story about the rain? 🤔 How consistent are their reports? That’s what we’re going to explore in this notebook!

Essentially, we’ll be playing the role of fact-checkers, scrutinizing the rainfall data from both our “reporters.” We’ll use a variety of tools and techniques to analyze the data, create visualizations, and assess the reliability of each source. This will involve looking for trends, calculating statistics, and even comparing how they describe specific events, like heavy downpours. Understanding the strengths and weaknesses of both satellite-derived and ground-based rainfall data is crucial. It can help us improve hydrological models, inform water management strategies, and enhance our ability to predict and respond to extreme weather events, no matter where we are in the world. By the end of this notebook, we’ll have a clearer picture of how to interpret and utilize these different sources of rainfall information for a more comprehensive understanding of our planet’s precipitation patterns. Let’s get started!

About this notebook

This notebook provides an educational demonstration on analyzing and comparing rainfall data from different sources, with a focus on the process rather than being a definitive research paper. It’s open-source, so you’re welcome to use and adapt it. If you find any errors or have suggestions, please help improve this resource by creating an issue on the GitHub. Your input is greatly valued!

Code
import geopandas as gpd
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import json
import myfunc
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
from IPython.display import display # noqa
import pytemplate # noqa

1 Introduction

This chapter provides the foundational context for this notebook, outlining the critical role of accurate rainfall data in effective water resource management. It introduces the two primary data sources, CHIRPS and BBWS Citarum, and describes the Citarum River Basin as the study area.

1.1 Project Background and Objectives

Rainfall is a crucial element in managing water resources, especially in a region like the Citarum River Basin. Understanding how much rain falls, where it falls, and when it falls is essential for preventing floods, managing droughts, and ensuring a reliable water supply for communities and agriculture. This notebook focuses on comparing two different sources of rainfall data: one from a global satellite-based system called CHIRPS and another from local rain gauges operated by BBWS Citarum, which we consider as the ground truth. By examining how well these two datasets agree, we can gain valuable insights into the accuracy of the satellite data and its potential for improving water management practices in the region.

The main goal of this notebook is to see how well the CHIRPS rainfall data matches up with the measurements taken from rain gauges on the ground (BBWS Citarum). We want to find out if the satellite data is consistent with the ground truth, where they might differ, and what those differences might mean for understanding rainfall patterns in the Citarum River Basin. Ultimately, this comparison will help us determine if CHIRPS data can be a reliable tool for supporting water resource management decisions, especially in areas where ground-based measurements are limited.

1.2 Data Sources

This section will briefly introduce the two datasets we’re using in this project: CHIRPS and BBWS Citarum.

  • CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data): Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) is a 35+ year quasi-global rainfall data set. Spanning 50°S-50°N (and all longitudes) and ranging from 1981 to near-present, CHIRPS incorporates their in-house climatology, CHPclim, 0.05° resolution satellite imagery, and in-situ station data to create gridded rainfall time series for trend analysis and seasonal drought monitoring 1. It’s especially helpful in areas where there aren’t many weather stations on the ground.

  • BBWS Citarum (Balai Besar Wilayah Sungai Citarum): This organization is responsible for managing water resources within the Citarum River Basin. They collect rainfall data using a network of rain gauges located throughout the basin. These rain gauges provide direct measurements of rainfall at specific points, which we consider our “ground truth” data. However, it is only cover specific area within Citarum River Basin. We will select rainfall data from automatic rain gauges operated by BBWS Citarum in this notebook.

We will go into more detail about how we access and process the data from each source in the next chapter (Chapter 2: Data Acquisition and Preprocessing).

1.3 Study Area

In this notebook, we’ll be exploring rainfall data from a specific area in Indonesia called the Upper Citarum River Watershed (or “DAS Citarum Hulu” in Indonesian). Think of it as our area of interest for this project! This watershed is important for managing water in West Java. It’s a fairly large area, covering about 1,738 square kilometers – that’s a bit bigger than the size of London or New York City.

Figure 1: Upper Citarum Watershed

Geographically, the Upper Citarum River Watershed sits between 6°45’ and 7°15’ South latitude and 107°21’ and 107°57’ East longitude. Parts of several cities and regions fall within this watershed, including Bandung City, Cimahi City, Bandung Regency, and Sumedang Regency. You can see the location of the watershed on Figure 1.


2 Data Acquisition and Preprocessing

This chapter details the process of acquiring and preparing the rainfall data from our two sources: the satellite-based CHIRPS dataset and the ground-based measurements from BBWS Citarum rain gauges. We will outline the steps taken to download, clean, and align these datasets, ensuring they are compatible for a robust comparison within the Upper Citarum River Watershed for a defined time period. This meticulous preparation is crucial to ensure the accuracy and reliability of our subsequent analysis.

2.1 CHIRPS Data

Building upon our introduction, we now delve into the specifics of our data sources, beginning with the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS). In this section, we’ll detail how we obtained the CHIRPS data for our study area, the Upper Citarum River Watershed, from the ClimateSERV platform, a tool designed for visualizing and downloading historical and forecasted climate data. While CHIRPS data is available from various sources, we opted for ClimateSERV for its user-friendly interface, which produced the data in netCDF4 format.

2.1.1 Using ClimateSERV to Obtain CHIRPS Data

For this analysis, we’ll be using the ClimateSERV platform to obtain CHIRPS rainfall data specifically for the Upper Citarum River Watershed. ClimateSERV offers a user-friendly interface for downloading pre-processed climate data. Follow these steps to get the data:

Figure 2: CHIRPS Data Download from ClimateSERV
  1. Navigate to ClimateSERV: Open your web browser and go to https://climateserv.servirglobal.net/map.

  2. Set Area of Interest (AOI):

    • On the left panel, you’ll see “Statistical Query” and “Set Area of Interest.”
    • Click on the “Upload” tab under “Set Area of Interest.”
    • You can either drag and drop your shapefile (in .zip, .json, or .geojson format) representing the Upper Citarum River Watershed boundary or click to select the file from your computer. The map on the right will automatically zoom into the uploaded shapefile as shown on Figure 2. If you don’t have a shapefile, you can use the “Draw” option to manually draw a rectangle around the area, but this method is less precise. For this case we use boundary of upper citarum basin which highlited with blue line.
  3. Select Data Parameters:

    • Under “Select Data,” choose “Download Raw Data” for “Type of Request.”
    • Select “Observation” for “Dataset Type.”
    • Choose “UCSB CHIRPS Rainfall” as the “Data Source.”
    • Select “NetCDF” as the “Download Format.”
  4. Specify Date Range:

    • Set the “Date Range” according to your analysis period. For this example, let’s use 2007-01-01 as the start date and 2019-12-31 as the end date.
  5. Submit Query:

    • Click the “Submit Query” button. ClimateSERV will process your request. The downloaded file will be contain NetCDF file (.nc), containing CHIRPS rainfall data clipped to the Upper Citarum River Watershed for the specified period.

By following these steps, we efficiently obtain CHIRPS data that is both spatially and temporally aligned with our study area and period, ready for further processing and analysis.

2.1.2 Opening and Inspecting CHIRPS Data

Having successfully downloaded our CHIRPS rainfall data from ClimateSERV, clipped precisely to the Upper Citarum River Watershed and covering the years 2007 to 2019, we now turn to the crucial next step: opening and inspecting this NetCDF file to understand its structure and contents. In this section, we’ll use specialized libraries to load the data, explore its dimensions, variables, and attributes, and gain a firm grasp of what this satellite-derived rainfall dataset truly represents before we move forward with any further processing or analysis. Essentially, we’re ready to unpack our data and see what’s inside, ensuring we have a solid foundation for comparing it with our ground-based measurements later on.

2.1.2.1 Loading the NetCDF File

With our CHIRPS data downloaded and ready, the first step is to load it into our computing environment so we can begin exploring its contents. For this task, we’ll be leveraging the powerful xarray library in Python, which is specifically designed for working with labeled, multi-dimensional datasets like the NetCDF files commonly used in climate science. Let’s store the path to our downloaded NetCDF file, which is conveniently named chirps_2007_2019_citarum_watershed.nc, in a variable called CHIRPS_PATH for easy reference.

Now, using xarray (often imported as xr), we can load the dataset with the xr.open_dataset() function. This function neatly unpacks the NetCDF file and stores its contents—including the precipitation data, spatial coordinates, timestamps, and metadata—into an xarray Dataset object, which we’ll call chirps_ds. This object will be our primary interface for interacting with the CHIRPS data throughout this analysis.

Table 1: CHIRPS Dataset Structure
<xarray.Dataset> Size: 3MB
Dimensions:               (latitude: 10, longitude: 13, time: 4748)
Coordinates:
  * latitude              (latitude) float64 80B -7.225 -7.175 ... -6.825 -6.775
  * longitude             (longitude) float64 104B 107.4 107.4 ... 107.9 108.0
  * time                  (time) datetime64[ns] 38kB 2007-01-01 ... 2019-12-31
Data variables:
    precipitation_amount  (time, latitude, longitude) float32 2MB ...

We can now see that chirps_ds is an xarray.Dataset object in Table 1, which neatly organizes the data into dimensions, coordinates, data variables, and attributes. We can observe that our dataset has three dimensions: latitude, longitude, and time, representing the spatial and temporal components of our rainfall data. The latitude dimension has a size of 10, longitude has a size of 13, and time has a size of 4748. These dimensions correspond to the spatial grid of our data and the number of daily time steps, respectively.

In the next subsection, we will delve deeper into these dimensions and explore the coordinates associated with them. This will give us a more concrete understanding of the spatial resolution of our CHIRPS data and the exact time period it covers.

2.1.2.2 Examining Dataset Dimensions and Variables

Now that we’ve successfully loaded our CHIRPS data into an xarray Dataset, let’s take a closer look at its fundamental components: dimensions and variables.

Dimensions in xarray are like the axes of our data cube, defining the shape and size of our dataset. In the output provided earlier, we saw three dimensions:

  • latitude (latitude: 10): This dimension represents the north-south extent of our data, and it has a size of 10. This means our data covers 10 distinct latitude points.
  • longitude (longitude: 13): This dimension represents the east-west extent, with a size of 13, indicating 13 distinct longitude points.
  • time (time: 4748): This dimension represents the temporal extent, with a size of 4748. This corresponds to 4748 daily time steps, covering the period from January 1, 2007, to December 31, 2019.

Together, these dimensions tell us that our data is organized as a 10x13x4748 grid, representing a spatial grid of 10 latitudes by 13 longitudes, with each grid cell containing rainfall data for each of the 4748 days in our time period.

Variables, on the other hand, hold the actual data values and metadata associated with each dimension. In our chirps_ds dataset, we have one primary data variable:

  • precipitation_amount (time, latitude, longitude): This variable holds the daily precipitation amount data, measured in millimeters (mm). Its shape matches the dimensions of our dataset (4748, 10, 13), meaning it contains a precipitation value for each combination of time, latitude, and longitude.

Let’s examine these components programmatically. We can access the dimensions directly from our chirps_ds object:

FrozenMappingWarningOnValuesAccess({'latitude': 10, 'longitude': 13, 'time': 4748})

This tells us, once again, that our dataset has a latitude dimension of size 10, a longitude dimension of size 13, and a time dimension of size 4748. Now that we’ve confirmed the dimensions, let’s proceed to access the precipitation_amount variable and inspect its attributes:

Table 2: CHIRPS Precipitation Amount Variable
<xarray.DataArray 'precipitation_amount' (time: 4748, latitude: 10,
                                          longitude: 13)> Size: 2MB
[617240 values with dtype=float32]
Coordinates:
  * latitude   (latitude) float64 80B -7.225 -7.175 -7.125 ... -6.825 -6.775
  * longitude  (longitude) float64 104B 107.4 107.4 107.5 ... 107.9 107.9 108.0
  * time       (time) datetime64[ns] 38kB 2007-01-01 2007-01-02 ... 2019-12-31
Attributes:
    long_name:              precipitation_amount
    units:                  mm
    accumulation_interval:  1 day
    comment:                Climate Hazards group InfraRed Precipitation with...
    cell_methods:           time: mean

Here’s what we can glean from Table 2:

  • DataArray: We’re dealing with an xarray.DataArray named precipitation_amount. This is the fundamental data structure in xarray for holding multi-dimensional labeled data.
  • Dimensions: The array has three dimensions: time (4748), latitude (10), and longitude (13), confirming what we saw earlier.
  • Data Type: The data is stored as float32, which means each precipitation value is a 32-bit floating-point number.
  • Coordinates:
    • latitude: The latitude values range from -7.225 to -6.775 degrees North.
    • longitude: The longitude values range from 107.375 to 107.975 degrees East.
    • time: The time values span from 2007-01-01 to 2019-12-31, stored as datetime64[ns] objects.
  • Attributes:
    • long_name: “precipitation_amount” - a descriptive name for the variable.
    • units: “mm” - indicating that the values represent millimeters of rainfall.
    • accumulation_interval: “1 day” - confirming that these are daily precipitation values.
    • comment: “Climate Hazards group InfraRed Precipitation with Stations” - providing the source of the data.
    • cell_methods: “time: mean” - indicating that the values represent the mean precipitation over each day.

In essence, Table 2 tells us that the precipitation_amount variable holds daily precipitation data in millimeters, arranged in a time-latitude-longitude grid, and provides the necessary metadata to interpret these values accurately. This detailed understanding of our core variable is crucial as we proceed to further exploration and analysis.

2.1.3 Initial Data Visualization

In this section, we’ll start visualizing our CHIRPS precipitation data to get a better sense of the spatial patterns of rainfall within the Upper Citarum River Watershed. We’ll begin with some simple plots for specific days and then move on to a more interactive map-based visualization.

2.1.3.1 Visualizing Precipitation for Specific Days

One of the quickest ways to get a visual overview of our data is to use the built-in plotting capabilities of xarray. The .plot() method, when applied to a DataArray, automatically generates a map if the data has spatial dimensions (latitude and longitude), which is the case for our precipitation_amount variable.

For our initial visualization, let’s focus on the first three days of our dataset: January 1st, 2007 to January 3rd, 2007. We can select these specific days using xarray’s .sel() method, which allows us to slice the data based on coordinate values.

Here’s the plots for these three days:

(a) January 1st, 2007
(b) January 2nd, 2007
(c) January 3rd, 2007
Figure 3: Precipitation from January 1st, 2007 to January 3rd, 2007

Here are some observations we can make based on Figure 3:

  • Spatial Variation: We can clearly see that rainfall is not uniform across the Upper Citarum River Watershed. Different areas receive different amounts of precipitation on each day.
  • Day-to-Day Changes: The spatial patterns of rainfall change from day to day. For example, on January 1st Figure 3 (a), the central part seems to receive more rainfall compared to other days. Meanwhile, on January 2nd Figure 3 (b), the northern part of the watershed appears to have received more rainfall.
  • Rainfall Amounts: The color scale indicates the amount of rainfall in millimeters (mm). We can see that some areas receive up to 8 mm of rainfall on these days, while others receive very little or none.
2.1.3.2 Creating an Interactive Choropleth Map

For this task, we’ll use the plotly library, which provides a high-level interface for creating interactive plots, including choropleth maps. We can leverage interactive maps to explore spatial rainfall patterns with greater flexibility. A choropleth map is an effective way to visualize our precipitation data, using color gradients within defined areas – in this case, grid cells – to represent rainfall amounts.

We’ve created an interactive choropleth map for January 1st, 2007 using the plotly.graph_objects library. This map displays CHIRPS precipitation across the Upper Citarum River Watershed, with each grid cell colored according to its rainfall value. The watershed boundary is also overlaid, providing context.

Figure 4: Interactive Map of Precipitation on January 1st, 2007

Interactive Features:

  • Hover: Displays the location (latitude, longitude) and precipitation value of a cell on mouse hover.
  • Zoom: Allows closer examination of specific areas.
  • Pan: Enables map movement to focus on regions of interest.

This interactive map offers a powerful way to explore spatial rainfall patterns, and could be further enhanced by adding a time-selection slider or linking to time-series plots for specific locations in Dash application.

Footnotes

  1. https://www.chc.ucsb.edu/data/chirps↩︎

Reuse

Citation

BibTeX citation:
@online{megariansyah2024,
  author = {Megariansyah, Taruma Sakti},
  title = {Evaluating {CHIRPS} with {Local} {Rainfall} {Data}},
  date = {2024-12-14},
  url = {https://dev.taruma.info/rf-comp-id/notebook_en.html},
  langid = {en},
  abstract = {\{\{\textless{} lipsum 1 \textgreater\}\}}
}
For attribution, please cite this work as:
Megariansyah, Taruma Sakti. 2024. “Evaluating CHIRPS with Local Rainfall Data.” December 14, 2024. https://dev.taruma.info/rf-comp-id/notebook_en.html.